Skip to content

Conversation

@xinaesthete
Copy link
Contributor

@xinaesthete xinaesthete commented Dec 12, 2025

zarrextra has been updated to process what I think we understand zarr v3 consolidated metadata to be. Along with this, the schema used for RasterAttrs has been changed so it should support outputs from spatialdata 0.6 - at least in as far as { ome: { multiscales: [] } } vs { multiscales: [] }, I may well be missing any number of other things.

Should fix #4

Some fairly extensive testing has been added.

  • Python/uv environments for SpatialData 0.5.0, 0.6.1 & 0.7.0. Not necessarily intending to keep having a whole matrix of every version in this way.
  • Generation of test-fixtures using spatialdata.datasets.blobs from those two versions of python library.
  • Proxy server for use locally to view datasets hosted publicly without CORS headers.
  • Scripts for validating and generating reports on behaviour of different implementations with datasets from spatialdata examples. (nb - python validation removed for now)

On the JS side, this testing could probably be expanded to include walking each element and checking that it can actually load. Don't want too many massively heavy tests in CI, but some version of that with blobs... pending getting all of those tests to actually work properly in CI environment.

Summary by CodeRabbit

  • New Features

    • Cross-version testing suite: generate fixtures for multiple library versions, local test server and CORS proxy, and JS/Python dataset validators with test commands.
  • Bug Fixes

    • Improved handling and normalization of raster/Zarr metadata and safer fallback for parquet loading.
  • Documentation

    • Expanded Testing section with prerequisites, per-version workflows, and commands for unit/integration/fixture generation.
  • Chores

    • CI test workflow, workspace Python/editor settings, test config and dependency updates.

@xinaesthete
Copy link
Contributor Author

When opening an image from a v0.6.1 dataset:

Access to fetch at 'http://localhost:8080/v0.6.1/blobs.zarr/images/blobs_multiscale_image/.zgroup' from origin 'http://localhost:5173' has been blocked by CORS policy: Method PUT is not allowed by Access-Control-Allow-Methods in preflight response.

I'm not sure why zarrita is making PUT requests to read-only stores here?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 15, 2025

Warning

Rate limit exceeded

@xinaesthete has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Adds Zarr v2/v3 detection and normalization, a consolidated tree-backed store parser and Zod-based Zarr validators, per-version Python fixture generation and validation tooling, local test server/CORS proxy, CI workflow, tests, and several package dependency/type updates across the monorepo.

Changes

Cohort / File(s) Summary
Editor & Python config
\.cursor/settings.json, \.vscode/settings.json, python/.python-version
Pinned workspace Python to 3.12/venv, auto-activate terminal venv, set Black formatter and formatOnSave.
CI & Git
.github/workflows/test.yml, .gitignore
New "Test" GitHub Actions workflow for building, generating Python fixtures, caching uv envs, and running unit/integration tests; .gitignore updated to exclude fixtures, validation outputs, venvs, and uv.lock.
Docs & npm scripts
README.md, package.json
Large Testing section added to README and many npm scripts for tests, fixture generation, server/proxy, and dataset validation.
Test runner & workspace
vitest.config.ts, pnpm-workspace.yaml, packages/*/vite.config.ts
Top-level Vitest config and per-package vite configs updated for tests/coverage; pnpm catalog swapped viv package and added jsdom.
Core store, types & models
packages/core/src/store/index.ts, packages/core/src/types.ts, packages/core/src/models/index.ts
Switch SpatialData.rootStore to ConsolidatedStore, remove parsed cache, read/serialize from rootStore.tree, adopt tree-backed element loading and stricter validation/error handling per element.
Schemas & public types
packages/core/src/schemas/index.ts, packages/core/src/vite-env.d.ts
Raster attrs now detect/normalize OME-NGFF v2/v3 into explicit RasterAttrs type; omero.label coerced to string; vite-env declaration added.
Zarrextra types & parsing
packages/zarrextra/src/types.ts, packages/zarrextra/src/index.ts
Added explicit Zarr v2/v3 metadata typings, v2→v3 normalization, metadata-driven path-based tree construction; several helpers made internal; LazyZarrArray re-exported.
Zarr schema validators
packages/zarrextra/src/zarrSchema.ts, packages/zarrextra/package.json
Introduce Zod-based v2/v3 zarray schemas, validateAndConvertV2Zarray and validateV3Zarray; added zod dependency.
Zarrextra tests & config
packages/zarrextra/tests/zarrSchema.spec.ts, packages/zarrextra/vite.config.ts
Comprehensive zarray schema tests added and test include pattern enabled.
Fixture generation & validation (Python)
python/scripts/generate_fixtures.py, python/scripts/validate_datasets.py, python/v*/generate_fixtures.py, python/v*/pyproject.toml
Orchestrator and per-version fixture generators (0.5.0, 0.6.1, 0.7.0) + manifests; dataset validation CLI with parallelism and Markdown/CSV/JSON outputs.
Test servers & validation JS
scripts/test-server.js, scripts/cors-proxy.js, scripts/validate-datasets-js.js
Add static fixture HTTP server (with CORS and directory listings), a CORS proxy with port fallback, and a JS dataset validation tool (optional Python comparison).
Integration & unit tests
tests/integration/fixtures.test.ts, packages/core/tests/schemas.spec.ts, packages/vis/tests/index.spec.tsx, packages/react/tests/index.spec.tsx
Integration tests for fixtures, schema/unit tests, and package-level smoke tests added; one legacy core unit test removed.
Visualization package updates
packages/vis/package.json, packages/vis/src/..., packages/vis/tests/*
Swap imports to @hms-dbmi/viv, remove BIO_ZARR usage (narrowed types), add jsdom dev dep and related tests.
Parquet fallback
packages/core/src/models/VTableSource.ts
Parquet loader now tries local parquet-wasm import then falls back to a CDN module with warnings and combined error messaging.
Auxiliary test config
vitest.config.ts
Top-level Vitest config added (globals, Node env, coverage reporters, package aliases).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Open as openExtraConsolidated
    participant Try as tryConsolidated
    participant Remote as RemoteStore
    participant Norm as ValidatorNormalizer
    participant Parser as parseStoreContents
    participant Tree as ZarrTree

    Client->>Open: openExtraConsolidated(source)
    Open->>Try: tryConsolidated(store)
    Try->>Remote: fetch zarr.json
    alt v3 present
        Remote-->>Norm: v3 consolidated_metadata
        Norm->>Parser: validateV3Zarray & build nodes
    else v3 absent
        Try->>Remote: fetch .zmetadata / zmetadata (v2)
        Remote-->>Norm: v2 metadata
        Norm->>Norm: validateAndConvertV2Zarray -> v3-normalized
        Norm->>Parser: normalized metadata
    end
    Parser->>Tree: build path-based ZarrTree
    Parser-->>Open: return { zarritaStore, tree }
    Open-->>Client: Ok({zarritaStore, tree})
Loading
sequenceDiagram
    participant User
    participant CLI as pnpm
    participant Orch as python/scripts/generate_fixtures.py
    participant UV as uv
    participant Gen050 as python/v0.5.0/generate_fixtures.py
    participant Gen061 as python/v0.6.1/generate_fixtures.py
    participant Gen070 as python/v0.7.0/generate_fixtures.py
    participant Out as test-fixtures

    User->>CLI: run test:fixtures:generate --all
    CLI->>Orch: start orchestration
    Orch->>UV: uv sync v0.5.0
    UV-->>Gen050: run generator
    Gen050->>Out: write v0.5.0/blobs.zarr
    Orch->>UV: uv sync v0.6.1
    UV-->>Gen061: run generator
    Gen061->>Out: write v0.6.1/blobs.zarr
    Orch->>UV: uv sync v0.7.0
    UV-->>Gen070: run generator
    Gen070->>Out: write v0.7.0/blobs.zarr
    Orch-->>User: aggregated success / exit code
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰
I sniffed through trees of metadata, bright,
Turned v2 whispers into v3 light.
Fixtures planted, servers hum,
Tests hop in — the work is done.
A crunchy carrot for passing runs!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Support newer versions of SpatialData with zarr v3 stores' accurately describes the primary objective of the PR—adding zarr v3 support for newer SpatialData versions.
Linked Issues check ✅ Passed The PR comprehensively addresses issue #4 by refactoring zarrextra to handle zarr v3 consolidated metadata, updating RasterAttrs schema for different nesting formats, and adding testing infrastructure for SpatialData 0.5.0, 0.6.1, and 0.7.0.
Out of Scope Changes check ✅ Passed All changes are in-scope: zarrextra updates handle zarr v3, schema changes support newer SpatialData versions, testing scripts enable validation across versions, and dependency updates (viv, jsdom) support the testing infrastructure.
Docstring Coverage ✅ Passed Docstring coverage is 83.78% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch zarr_v3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (33)
vitest.config.ts (1)

1-24: Ensure __dirname works with your Vitest module format

If your project is using ESM (e.g. "type": "module" in package.json or TS compiled as ESM), __dirname won’t be defined in vitest.config.ts, which would break test startup. In that case, consider computing it from import.meta.url:

-import { defineConfig } from 'vitest/config';
-import { resolve } from 'node:path';
-
-export default defineConfig({
+import { defineConfig } from 'vitest/config';
+import { resolve, dirname } from 'node:path';
+import { fileURLToPath } from 'node:url';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+
+export default defineConfig({
   test: {
@@
   resolve: {
     alias: {
       '@spatialdata/core': resolve(__dirname, 'packages/core/src'),
       '@spatialdata/zarrextra': resolve(__dirname, 'packages/zarrextra/src'),
     },
   },
 });

If the repo is still CJS-only, the existing __dirname usage is fine, but it’s worth confirming the module type to avoid subtle config failures.

packages/core/src/store/index.ts (1)

82-119: Consider a safer toJSON fallback than returning this

toJSON() currently returns the instance itself when this.rootStore.tree is falsy, which can be surprising and risks odd serialization behavior (or recursion) if JSON.stringify is used on SpatialData. Since ConsolidatedStore.tree should normally exist, a minimal or empty fallback is safer:

   toJSON() {
-    if (!this.rootStore.tree) return this;
-    return serializeZarrTree(this.rootStore.tree);
+    if (!this.rootStore.tree) {
+      // Fallback to a minimal representation instead of the full instance
+      return { url: this.url };
+    }
+    return serializeZarrTree(this.rootStore.tree);
   }

The new toString() with a try/catch looks good and should make debugging bad metadata less painful.

packages/zarrextra/src/types.ts (1)

45-135: Align comments/types with v3-normalized metadata and consider widening fill_value

You now normalize v2 .zmetadata to the v3-shaped ZarrV3Metadata and type IntermediateConsolidatedStore as Store & { zmetadata: ZarrV3Metadata }, which is a clear internal contract; the surrounding comments about a v2/v3 union and ConsolidatedMetadata/MetadataFormat could be updated or trimmed so future readers don’t expect both shapes to appear at runtime.

Also, ZarrV3ArrayNode.fill_value is restricted to number | string | boolean, but real Zarr stores may use null or more complex types. If you ever surface fill_value beyond debugging, you might want a looser type (e.g. unknown or number | string | boolean | null) to better reflect what can be in the metadata.

packages/zarrextra/src/index.ts (2)

15-101: Use per-level paths for attribute lookup when building the tree

In parseStoreContents, you currently call:

const attrs = getZattrs(normalizedPath as zarr.AbsolutePath, store);

inside the loop over pathParts. This means that when you synthesize intermediate group nodes for paths that don’t have their own metadata entries, they’ll look up attributes using the full leaf path and can accidentally inherit the leaf array’s attributes.

You can make attribute resolution more robust for both groups and arrays by computing the path prefix for the node you’re creating:

-    for (const [i, part] of pathParts.entries()) {
+    for (const [i, part] of pathParts.entries()) {
       if (!(part in currentNode)) {
         const isLeaf = i === pathParts.length - 1;
         const isArray = isLeaf && info.isArray;
-        
-        // Get attributes for this path (normalizedPath already has leading /)
-        const attrs = getZattrs(normalizedPath as zarr.AbsolutePath, store);
+        // Get attributes for this node using its own path prefix
+        const currentPath = `/${pathParts.slice(0, i + 1).join('/')}`;
+        const attrs = getZattrs(currentPath as zarr.AbsolutePath, store);
@@
           currentNode[part] = {
             [ATTRS_KEY]: attrs,
             [ZARRAY_KEY]: zarray,
             get: () => zarr.open(root.resolve(normalizedPath), { kind: 'array' })
           };

This keeps array lookups unchanged while ensuring group nodes only ever consult their own metadata paths.


134-224: Preserve v2 dtype when normalizing to v3 data_type

In normalizeV2ToV3Metadata, array nodes default data_type to 'float64' unless zarray.data_type exists:

data_type: (zarray.data_type as string) || 'float64',

For v2 .zarray metadata, the field is typically dtype, not data_type, so you’ll silently treat all such arrays as float64 in your normalized metadata even when the underlying dtype is different. If any downstream logic introspects data_type, it will see incorrect values.

You can cheaply preserve the v2 information by falling back to dtype:

-    if (group.zarray) {
-      // Array node - use zarray metadata as the base
-      const zarray = group.zarray as Record<string, unknown>;
-      metadata[path] = {
-        shape: (zarray.shape as number[]) || [],
-        data_type: (zarray.data_type as string) || 'float64',
+    if (group.zarray) {
+      // Array node - use zarray metadata as the base
+      const zarray = group.zarray as Record<string, unknown>;
+      const dtype = (zarray.data_type ?? zarray.dtype) as string | undefined;
+      metadata[path] = {
+        shape: (zarray.shape as number[]) || [],
+        data_type: dtype || 'float64',
@@
-        zarr_format: (zarray.zarr_format as number) || 3,
+        zarr_format: (zarray.zarr_format as number) || 3,

This keeps the internal v3-like shape while more accurately reflecting v2 array metadata.

.vscode/settings.json (1)

1-14: Consider whether to commit IDE settings.

Committing .vscode/settings.json directly can cause conflicts when team members have different setups (e.g., different Python versions or custom extensions). A common practice is to either:

  1. Add .vscode/settings.json to .gitignore and provide a .vscode/settings.json.sample as a template
  2. Keep only team-agreed settings and document others in README

This is minor since the settings are Python-related and align with the project's testing infrastructure.

.github/workflows/test.yml (2)

31-33: Consider pinning uv version for reproducibility.

Using version: "latest" may cause unexpected CI failures when a new uv version introduces breaking changes. Consider pinning to a specific version.

       - uses: astral-sh/setup-uv@v4
         with:
-          version: "latest"
+          version: "0.5.10"  # or current stable version

55-74: Add failure handling when server doesn't start.

If the server fails to start within 30 seconds, the loop exits silently and integration tests run anyway (likely failing with confusing errors). Consider adding explicit failure detection.

The shellcheck warning about unused i is a false positive for this retry-loop pattern.

           # Wait for server to be ready (up to ~30s)
+          SERVER_READY=false
           for i in {1..30}; do
             if curl -sSf http://localhost:8080/ >/dev/null; then
               echo "Test server is up"
+              SERVER_READY=true
               break
             fi
             echo "Waiting for test server on http://localhost:8080/ ..."
             sleep 1
           done

+          if [ "$SERVER_READY" != "true" ]; then
+            echo "ERROR: Test server failed to start within 30 seconds"
+            kill "$SERVER_PID" 2>/dev/null || true
+            exit 1
+          fi
+
           # Run integration tests (will hit http://localhost:8080/…)
           pnpm test:integration
scripts/cors-proxy.js (1)

40-119: Consider adding request timeout.

The fetch call has no timeout, which could cause the proxy to hang indefinitely if the target server doesn't respond. For a development tool, this may be acceptable, but consider adding a timeout for better developer experience.

     // Make the request
-    const response = await fetch(targetUrl, fetchOptions);
+    const controller = new AbortController();
+    const timeoutId = setTimeout(() => controller.abort(), 30000); // 30s timeout
+    
+    try {
+      const response = await fetch(targetUrl, { ...fetchOptions, signal: controller.signal });
+      clearTimeout(timeoutId);
scripts/test-server.js (2)

78-143: Consider adding OPTIONS preflight handler for CORS.

The server adds CORS headers to GET responses but doesn't handle OPTIONS preflight requests explicitly. While browsers may not send preflight for simple GET requests, adding OPTIONS support would ensure consistent behavior with the CORS proxy.

 async function handleRequest(req, res) {
   try {
+    // Handle OPTIONS preflight
+    if (req.method === 'OPTIONS') {
+      res.writeHead(200, {
+        'Access-Control-Allow-Origin': '*',
+        'Access-Control-Allow-Methods': 'GET, HEAD, OPTIONS',
+        'Access-Control-Allow-Headers': 'Range',
+      });
+      res.end();
+      return;
+    }
+
     // Remove query string and normalize path

91-99: Path traversal check has potential edge case with symlinks.

The check !fullPath.startsWith(fixturesDir) validates the resolved path, which is good. However, if fixturesDir contains symlinks pointing outside the fixtures directory, they could bypass this check. For a test server this is low risk, but consider using realpath for stricter validation if needed.

tests/unit/schemas.test.ts (1)

197-302: Consider adding negative test cases for invalid data.

The tests validate happy paths well, but could benefit from additional negative test cases, such as:

  • Invalid encoding-type values
  • Missing required fields in tableAttrsSchema
  • Malformed coordinate systems

Example addition for tableAttrsSchema:

it('should reject missing required fields', () => {
  const attrs = {
    instance_key: 'cell_id',
    // missing region, region_key, spatialdata-encoding-type
  };

  expect(() => tableAttrsSchema.parse(attrs)).toThrow();
});
README.md (1)

184-186: Add language specifier to fenced code block.

The fenced code block is missing a language specifier. Based on static analysis hint (MD040).

-```
+```text
 http://localhost:8081/https://example.com/data.zarr/.zattrs

</blockquote></details>
<details>
<summary>package.json (1)</summary><blockquote>

`14-26`: **Well-organized test scripts with good naming conventions.**

The scripts provide a comprehensive testing workflow. Minor consideration:



The `validate:all` script uses `bash scripts/validate-all.sh` which may not work on Windows without WSL or Git Bash. Consider using a cross-platform alternative or documenting this limitation.

```diff
-    "validate:all": "bash scripts/validate-all.sh",
+    "validate:all": "node scripts/validate-all.js",

Alternatively, add a note in the README that this script requires a bash-compatible shell.

tests/integration/fixtures.test.ts (2)

16-35: Consider escaping version parameter in shell command.

While version comes from a controlled array, the use of string interpolation in execSync could be a concern if the test matrix is ever extended with external input.

 function ensureFixtures(version: string): string {
   const fixturePath = join(projectRoot, 'test-fixtures', `v${version}`, 'blobs.zarr');
   
   if (!existsSync(fixturePath)) {
     console.log(`Fixtures not found for version ${version}, generating...`);
     try {
-      execSync(`uv run python/scripts/generate_fixtures.py --version ${version}`, {
+      execSync(`uv run python/scripts/generate_fixtures.py --version "${version}"`, {
         cwd: projectRoot,
         stdio: 'inherit',
       });

Or use spawnSync with an arguments array to avoid shell interpretation entirely.


49-132: Tests silently pass when server is not running.

The tests return early with a console.warn when the server isn't running, which means they'll appear to pass in CI even when not actually exercised. Consider using it.skip() or throwing a skip exception to make this more visible.

     } catch (error) {
       if (error instanceof Error && error.message.includes('fetch')) {
-        console.warn(
-          'Skipping integration test - test server not running. ' +
-          'Start it with: pnpm test:server'
-        );
-        return;
+        // Use Vitest's skip API to make skipped tests visible
+        throw new Error(
+          'Test skipped: test server not running. Start it with: pnpm test:server'
+        );
       }
       throw error;
     }

Alternatively, use a beforeAll hook to check server availability and skip the entire describe block if unavailable.

packages/core/src/models/index.ts (1)

480-504: Error handling in loadElements prevents cascading failures.

The try-catch around createElement (lines 495-502) is a reasonable trade-off to prevent a single malformed element from breaking the entire load. The comment appropriately acknowledges the potential confusion of returning {} vs undefined.

Consider tracking failed elements and exposing them for debugging:

const result: Record<string, ElementInstanceMap[T]> = {};
const errors: Record<string, Error> = {};
for (const key of keys) {
  try {
    result[key] = createElement(name, sdata, key);
  } catch (error) {
    errors[key] = error as Error;
    console.error(`Failed to load ${name}/${key}:`, error);
  }
}
// Optionally return { elements: result, errors } or attach to metadata
return Object.keys(result).length > 0 ? result : undefined;

This would give consumers visibility into partial load failures.

scripts/validate-all.sh (1)

1-14: Script setup looks good, but consider adding dependency checks.

The script uses external tools (uv, pnpm, jq, node) without verifying they're available. Consider adding dependency checks at the start to provide clearer error messages.

 set -e  # Exit on error
 
+# Check required dependencies
+for cmd in uv pnpm jq node; do
+  if ! command -v "$cmd" &> /dev/null; then
+    echo -e "${RED}Error: Required command '$cmd' not found${NC}"
+    exit 1
+  fi
+done
+
 # Colors for output
python/v0.6.1/generate_fixtures.py (3)

27-29: Remove extraneous f-string prefix.

Line 29 has an f-string without any placeholders. As flagged by Ruff (F541).

     if actual_version != version:
         print(f"⚠️  Warning: Expected version {version} but got {actual_version}")
-        print(f"   This may indicate the wrong environment is active.")
+        print("   This may indicate the wrong environment is active.")

42-46: Move import to top of file.

The shutil import inside the function is unconventional. Consider moving it to the top-level imports for better readability and consistency with Python conventions.

 import sys
 from pathlib import Path
+import shutil
 
 # Add scripts directory to path if we need shared utilities

Then remove line 43:

-    import shutil

85-90: Rename unused loop variable.

The variable elem is not used within the loop body. As flagged by Ruff (B007).

             elif isinstance(elements, list):
                 print(f"  - {element_type}: {len(elements)} element(s)")
-                for i, elem in enumerate(elements):
+                for i, _elem in enumerate(elements):
                     print(f"    * element_{i}")
python/v0.5.0/generate_fixtures.py (6)

27-29: Remove extraneous f-string prefix.

Line 29 has an f-string without any placeholders. As flagged by Ruff (F541).

     if actual_version != version:
         print(f"⚠️  Warning: Expected version {version} but got {actual_version}")
-        print(f"   This may indicate the wrong environment is active.")
+        print("   This may indicate the wrong environment is active.")

50-54: Avoid bare except clause.

The bare except: will catch all exceptions including KeyboardInterrupt and SystemExit, which is generally undesirable. As flagged by Ruff (E722).

             try:
                 delattr(sdata, "points")
-            except:
+            except Exception:
                 # If that fails, set to empty dict
                 sdata.points = {}

72-79: Avoid bare except with pass and consider logging.

The try-except-pass pattern silently swallows errors. While this is defensive cleanup code, logging the exception would help with debugging. As flagged by Ruff (E722, S110).

     # Double-check it's gone
     if os.path.exists(store_path):
         print(f"Warning: Store path still exists after removal attempt: {store_path}")
         # Force remove
         try:
             shutil.rmtree(store_path, ignore_errors=True)
-        except:
-            pass
+        except Exception as e:
+            print(f"Warning: Force removal also failed: {e}")

93-102: Consider catching more specific exceptions for cleanup.

The broad Exception catch and subsequent bare except: make debugging harder. Consider at minimum logging the cleanup failure.

     except Exception as e:
         # Clean up any partially written store
         if os.path.exists(store_path):
             try:
                 if os.path.isdir(store_path):
                     shutil.rmtree(store_path)
                 else:
                     os.remove(store_path)
-            except:
-                pass
+            except OSError as cleanup_error:
+                print(f"Warning: Cleanup failed: {cleanup_error}")

131-134: Rename unused loop variable.

The variable elem is not used within the loop body. As flagged by Ruff (B007).

             elif isinstance(elements, list):
                 print(f"  - {element_type}: {len(elements)} element(s)")
-                for i, elem in enumerate(elements):
+                for i, _elem in enumerate(elements):
                     print(f"    * element_{i}")

20-148: Significant code duplication with v0.6.1 generator.

The generate_fixtures function shares substantial logic with python/v0.6.1/generate_fixtures.py. The main differences are the version string and the points workaround. Consider extracting shared utilities to reduce maintenance burden.

A shared module in python/scripts/ could contain:

  • The element enumeration/printing logic (lines 121-144)
  • The coordinate systems printing logic (lines 138-144)
  • The common write fallback pattern

This would make version-specific scripts smaller and easier to maintain.

scripts/validate-datasets-js.js (1)

223-262: Consider using a CLI parsing library.

The manual argument parsing works but is verbose and error-prone. For a script of this complexity, consider using a library like commander or yargs, or at minimum, add validation for unknown arguments.

Currently unknown arguments are silently ignored. Consider:

} else if (!arg.startsWith('--')) {
  // Positional arguments - could be intentional or typo
} else {
  console.error(`Unknown option: ${arg}`);
  process.exit(1);
}
python/scripts/validate_datasets.py (5)

172-181: Remove unused exception variable.

The exception variable e is captured but never used. As per static analysis (Ruff F841).

-        except (json.JSONDecodeError, IndexError) as e:
+        except (json.JSONDecodeError, IndexError):

257-281: Remove unnecessary f-string prefixes.

Several strings have f prefix but contain no placeholders. As per static analysis (Ruff F541).

             if result.success:
-                lines.append(f"**Status:** ✅ Success")
+                lines.append("**Status:** ✅ Success")
                 lines.append("")
             else:
-                lines.append(f"**Status:** ❌ Failed")
+                lines.append("**Status:** ❌ Failed")
                 lines.append("")
                 lines.append(f"**Error Type:** `{result.error_type}`")
                 lines.append("")
-                lines.append(f"**Error Message:**")
+                lines.append("**Error Message:**")

219-225: Consider extracting version constants to avoid duplication.

The version strings "0.5.0" and "0.6.1" are hardcoded in multiple places (here, line 249, line 342, line 399). Extracting to a module-level constant would improve maintainability when adding new versions.

# Near top of file
SUPPORTED_VERSIONS = ["0.5.0", "0.6.1"]

401-413: Consider verifying environment directory exists before sync.

If the environment directory python/v{version} doesn't exist, uv sync will fail with a potentially confusing error. Adding a pre-check would provide clearer feedback.

     # Ensure environments are set up
     for version in versions:
         env_dir = project_root / "python" / f"v{version}"
+        if not env_dir.exists():
+            print(f"Error: Environment directory not found: {env_dir}", file=sys.stderr)
+            print("Please ensure the version-specific directories are set up.", file=sys.stderr)
+            sys.exit(1)
         print(f"Setting up environment for spatialdata {version}...", file=sys.stderr)

474-477: Unreachable else branch.

The argparse choices constraint ensures output_format is always one of markdown, csv, or json. The else branch is unreachable dead code.

     elif args.output_format == "json":
         output = json.dumps([r.to_dict() for r in results], indent=2)
-    else:
-        output = ""
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e64f484 and b90367c.

📒 Files selected for processing (26)
  • .cursor/settings.json (1 hunks)
  • .github/workflows/test.yml (1 hunks)
  • .gitignore (1 hunks)
  • .vscode/settings.json (1 hunks)
  • README.md (2 hunks)
  • package.json (1 hunks)
  • packages/core/src/models/index.ts (6 hunks)
  • packages/core/src/schemas/index.ts (3 hunks)
  • packages/core/src/store/index.ts (3 hunks)
  • packages/core/src/types.ts (2 hunks)
  • packages/zarrextra/src/index.ts (2 hunks)
  • packages/zarrextra/src/types.ts (1 hunks)
  • python/.python-version (1 hunks)
  • python/scripts/generate_fixtures.py (1 hunks)
  • python/scripts/validate_datasets.py (1 hunks)
  • python/v0.5.0/generate_fixtures.py (1 hunks)
  • python/v0.5.0/pyproject.toml (1 hunks)
  • python/v0.6.1/generate_fixtures.py (1 hunks)
  • python/v0.6.1/pyproject.toml (1 hunks)
  • scripts/cors-proxy.js (1 hunks)
  • scripts/test-server.js (1 hunks)
  • scripts/validate-all.sh (1 hunks)
  • scripts/validate-datasets-js.js (1 hunks)
  • tests/integration/fixtures.test.ts (1 hunks)
  • tests/unit/schemas.test.ts (1 hunks)
  • vitest.config.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (12)
scripts/test-server.js (1)
scripts/cors-proxy.js (2)
  • PORT (15-15)
  • server (167-167)
tests/integration/fixtures.test.ts (3)
scripts/test-server.js (3)
  • __filename (15-15)
  • __dirname (16-16)
  • projectRoot (17-17)
scripts/validate-datasets-js.js (1)
  • sdata (79-79)
packages/core/src/store/index.ts (1)
  • readZarr (122-132)
vitest.config.ts (1)
scripts/test-server.js (1)
  • __dirname (16-16)
tests/unit/schemas.test.ts (1)
packages/core/src/schemas/index.ts (6)
  • rasterAttrsSchema (322-345)
  • coordinateTransformationSchema (107-107)
  • shapesAttrsSchema (370-377)
  • pointsAttrsSchema (385-392)
  • tableAttrsSchema (399-406)
  • spatialDataSchema (413-416)
packages/core/src/store/index.ts (3)
packages/zarrextra/src/index.ts (3)
  • ConsolidatedStore (384-384)
  • serializeZarrTree (353-381)
  • openExtraConsolidated (297-320)
packages/zarrextra/src/types.ts (1)
  • ConsolidatedStore (144-147)
packages/core/src/types.ts (2)
  • ElementNames (28-28)
  • ElementName (29-29)
packages/core/src/models/index.ts (2)
packages/core/src/types.ts (1)
  • ZarrTree (76-76)
packages/zarrextra/src/types.ts (1)
  • ZarrTree (40-43)
scripts/validate-datasets-js.js (2)
packages/core/src/store/index.ts (1)
  • readZarr (122-132)
python/scripts/validate_datasets.py (1)
  • main (335-486)
python/v0.5.0/generate_fixtures.py (2)
python/v0.6.1/generate_fixtures.py (2)
  • generate_fixtures (20-102)
  • main (105-125)
python/scripts/generate_fixtures.py (1)
  • main (15-96)
python/scripts/generate_fixtures.py (2)
python/v0.5.0/generate_fixtures.py (1)
  • main (151-171)
python/v0.6.1/generate_fixtures.py (1)
  • main (105-125)
packages/core/src/types.ts (2)
packages/zarrextra/src/index.ts (1)
  • ConsolidatedStore (384-384)
packages/zarrextra/src/types.ts (1)
  • ConsolidatedStore (144-147)
python/v0.6.1/generate_fixtures.py (1)
python/v0.5.0/generate_fixtures.py (2)
  • generate_fixtures (20-148)
  • main (151-171)
packages/zarrextra/src/index.ts (3)
packages/zarrextra/src/types.ts (10)
  • IntermediateConsolidatedStore (133-135)
  • ZarrTree (40-43)
  • ZarrV3GroupNode (86-95)
  • ZarrV3ArrayNode (56-81)
  • ZAttrsAny (11-11)
  • ATTRS_KEY (16-16)
  • ZARRAY_KEY (21-21)
  • ZarrV3Metadata (103-112)
  • ZarrV2Metadata (49-51)
  • ConsolidatedStore (144-147)
packages/core/src/types.ts (7)
  • ZarrTree (76-76)
  • ZAttrsAny (76-76)
  • ATTRS_KEY (77-77)
  • ZARRAY_KEY (77-77)
  • Result (81-81)
  • Err (82-82)
  • Ok (82-82)
packages/zarrextra/src/result.ts (3)
  • Result (17-19)
  • Err (25-25)
  • Ok (22-22)
🪛 actionlint (1.7.9)
.github/workflows/test.yml

55-55: shellcheck reported issue in this script: SC2034:warning:6:1: i appears unused. Verify use (or export if used externally)

(shellcheck)

🪛 markdownlint-cli2 (0.18.1)
README.md

184-184: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.8)
python/scripts/validate_datasets.py

151-151: subprocess call: check for execution of untrusted input

(S603)


152-152: Starting a process with a partial executable path

(S607)


172-172: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


192-192: Do not catch blind exception: Exception

(BLE001)


258-258: f-string without any placeholders

Remove extraneous f prefix

(F541)


274-274: f-string without any placeholders

Remove extraneous f prefix

(F541)


278-278: f-string without any placeholders

Remove extraneous f prefix

(F541)


405-405: subprocess call: check for execution of untrusted input

(S603)


406-406: Starting a process with a partial executable path

(S607)

python/v0.5.0/generate_fixtures.py

29-29: f-string without any placeholders

Remove extraneous f prefix

(F541)


52-52: Do not use bare except

(E722)


69-69: Do not catch blind exception: Exception

(BLE001)


78-78: Do not use bare except

(E722)


78-79: try-except-pass detected, consider logging the exception

(S110)


101-101: Do not use bare except

(E722)


101-102: try-except-pass detected, consider logging the exception

(S110)


112-114: Avoid specifying long messages outside the exception class

(TRY003)


133-133: Loop control variable elem not used within loop body

Rename unused elem to _elem

(B007)

python/scripts/generate_fixtures.py

57-57: subprocess call: check for execution of untrusted input

(S603)


58-58: Starting a process with a partial executable path

(S607)


74-74: subprocess call: check for execution of untrusted input

(S603)


75-80: Starting a process with a partial executable path

(S607)

python/v0.6.1/generate_fixtures.py

29-29: f-string without any placeholders

Remove extraneous f prefix

(F541)


87-87: Loop control variable elem not used within loop body

Rename unused elem to _elem

(B007)

🔇 Additional comments (39)
.gitignore (3)

12-13: LGTM: Test artifact directories appropriately ignored.

The additions of test-fixtures/ and validation-results/ align with the test infrastructure introduced in this PR and keep generated artifacts out of version control.


15-19: LGTM: Version-specific Python environments properly isolated.

The ignore rules for python/v0.5.0/.venv/, python/v0.6.1/.venv/, and their respective uv.lock files are appropriate for isolating test environments across multiple SpatialData versions without polluting the repository.


26-28: Clarify: Comment mentions .cursor/settings.json but no corresponding ignore rule.

Line 26 notes that .cursor/settings.json is committed for Python interpreter setup, but the .gitignore does not include any rules for the .cursor/ directory (unlike the selective rules for .vscode/). This could lead to unintended commits of .cursor/ files in the future.

Please either:

  1. Add a note explaining that .cursor/ is not used or doesn't need selective ignoring, or
  2. Add corresponding ignore rules for .cursor/ (e.g., .cursor/launch.json and .cursor/tasks.json), similar to the .vscode/ entries.
python/.python-version (1)

1-2: Python version pin matches project configs

Pinning Python to 3.12 here is consistent with the requires-python = ">=3.12" settings in the fixture pyprojects; just ensure CI/dev environments also run 3.12 so behavior matches local testing.

python/v0.5.0/pyproject.toml (1)

1-8: Check spatialdata==0.5.0 support for Python 3.12

requires-python = ">=3.12" plus spatialdata==0.5.0 is fine if that release declares/supports 3.12; otherwise environment resolution could fail or behave oddly. Please double‑check spatialdata 0.5.0’s supported Python versions and adjust requires-python or the pinned version if needed.

.cursor/settings.json (1)

1-14: Cursor Python configuration looks consistent with new fixture envs

Pointing python.defaultInterpreterPath at python/v0.6.1/.venv/bin/python3 and documenting the 0.5.0 alternative in comments is a reasonable default; this should work well as long as contributors switch the path if they primarily work against 0.5.0 fixtures.

python/v0.6.1/pyproject.toml (1)

1-8: Verify spatialdata==0.6.1 on Python 3.12

This env also requires Python ≥3.12 with spatialdata==0.6.1. Please confirm that 0.6.1 officially supports 3.12 so fixture generation doesn’t hit installation/runtime issues; if not, you may need to relax requires-python or choose a different Python version for this env.

packages/core/src/types.ts (2)

17-17: LGTM!

Import of ConsolidatedStore is appropriate for the updated type definition.


88-93: Breaking change to SDataProps interface.

The rootStore type change to ConsolidatedStore is a breaking change that affects consumers of this interface. All internal codebase usages have been updated and are compatible with the new type—both the .tree and .zarritaStore properties are properly accessed throughout the codebase. This aligns with the PR's goal to support zarr v3.

scripts/cors-proxy.js (1)

1-184: LGTM overall with security caveat noted.

The CORS proxy implementation is well-structured for local development. The warning about being development-only is appropriate. The SSRF potential is acknowledged by design since this is meant to proxy arbitrary URLs.

Key positives:

  • Proper CORS header handling
  • URL validation requiring http(s)://
  • Graceful shutdown handling
  • Good error handling with CORS headers preserved on errors
tests/unit/schemas.test.ts (3)

1-10: LGTM! Well-structured test imports and organization.

The imports are clean and cover all the necessary schemas being tested. Good separation of concerns with the test file structure.


11-100: Excellent coverage of version normalization logic.

The rasterAttrsSchema tests comprehensively cover:

  • v0.6.1 format normalization (nested ome key extraction)
  • v0.5.0 format passthrough
  • Preservation of omero data from v0.6.1

This directly validates the core PR objective of supporting both zarr v2 and v3 stores.


102-195: Good transformation schema test coverage.

Tests cover all transformation types: scale, translation, affine, identity, sequence, and coordinate system references. The empty array rejection test (lines 190-194) validates the .min(1) constraint from the schema definition.

README.md (2)

21-22: LGTM! Clear prerequisites documentation.

Good addition of Python and uv requirements for test fixture generation.


53-308: Comprehensive testing documentation.

Excellent documentation covering the complete testing workflow including:

  • Environment setup for multiple SpatialData versions
  • Fixture generation commands
  • Test server and CORS proxy usage
  • Dataset validation with multiple output formats

The security warning for the CORS proxy (line 202) is appropriately highlighted with ⚠️.

python/scripts/generate_fixtures.py (4)

1-12: LGTM! Clear module documentation.

Good docstring explaining the script's purpose and its relationship to version-specific scripts.


15-40: LGTM! Clean argument parsing and path resolution.

The argument parsing with choices validation and sensible defaults is well-implemented. Path resolution from script location is correct.


42-86: Good error handling with continuation on failure.

The script appropriately continues to the next version when one fails, collecting all errors before reporting. The static analysis warnings (S603/S607) about subprocess security are false positives here since:

  1. The uv command is a hardcoded tool name
  2. The version is validated via argparse choices
  3. Paths are constructed from controlled values, not user input

88-100: LGTM! Clear success/failure reporting.

Good use of exit codes and visual feedback for CI/CD integration.

tests/integration/fixtures.test.ts (1)

1-11: LGTM! Standard ESM path resolution.

Correct usage of fileURLToPath and dirname for ESM module path resolution.

packages/core/src/models/index.ts (4)

57-70: LGTM! Clean migration to tree-based store access.

The switch from parsed store to rootStore.tree is well-implemented with appropriate null checks and error messages.


206-211: Stricter validation is appropriate but is a breaking change.

Switching from console.warn + fallback to throwing on validation failure is the right approach for data integrity. However, this is a breaking change for consumers who previously relied on the fallback behavior.

Ensure this breaking change is documented in release notes. Consumers who handle malformed attrs may need to update their error handling.


336-344: LGTM! Consistent validation and store access pattern.

ShapesElement follows the same stricter validation pattern and correctly uses zarritaStore from the root store for geometry loading.


392-398: LGTM! Consistent with other spatial elements.

PointsElement follows the established pattern for validation and store access.

scripts/validate-all.sh (2)

37-47: Python validation step is well-structured.

The script correctly captures both stdout and stderr to a file while still allowing the script to fail on error. The dual-pass approach (JSON then Markdown) is a reasonable pattern for generating multiple output formats.


112-115: Relative symlink creation is correct.

Using a relative path ($TIMESTAMP) for the symlink target ensures portability. The rm -f before ln -s handles existing symlinks properly.

python/v0.6.1/generate_fixtures.py (1)

52-71: Exception handling with multiple fallbacks is reasonable for cross-version compatibility.

The cascading fallback approach (writewrite without overwrite → write_zarrsd.io.write_zarr) handles API variations across spatialdata versions well. The nested try-except structure, while verbose, ensures maximum compatibility.

scripts/validate-datasets-js.js (3)

9-11: Import from dist directory requires build step.

The import from ../packages/core/dist/index.js requires the package to be built first. This is handled by validate-all.sh but should be documented in the script's help text or comments.


60-114: Dataset validation function is well-implemented.

Good error handling with structured result objects, stack trace capture (limited to 5 lines), and flexible proxy support. The element extraction logic handles the SpatialData structure correctly.


358-361: Top-level error handling is appropriate.

The .catch() handler ensures any unhandled promise rejections result in a non-zero exit code, which is important for CI integration.

packages/core/src/schemas/index.ts (4)

257-283: V0.5.0 schema correctly captures top-level multiscales structure.

The schema appropriately uses .passthrough() to preserve unvalidated fields, which is important for forward compatibility.


285-315: V0.6.1 schema correctly captures nested OME structure.

The nested ome object with its own .passthrough() allows for extension fields within the ome namespace while still validating the core structure.


347-364: Explicit RasterAttrs type provides clear contract for consumers.

The explicit type definition with [key: string]: unknown index signature correctly represents the normalized output structure and allows for additional passthrough fields.


322-345: Transform logic correctly normalizes both schema formats.

The transform works as intended:

  1. Detects v0.6.1 format by checking for 'ome' key
  2. Extracts multiscales/omero from nested structure
  3. Preserves spatialdata_attrs and other top-level fields
  4. Falls through for v0.5.0 format

Type assertions (as RasterAttrs, as RasterAttrs['multiscales']) are necessary given the transform's complexity and are already covered by comprehensive unit tests in tests/unit/schemas.test.ts that validate both v0.5.0 and v0.6.1 formats, including omero data preservation.

python/scripts/validate_datasets.py (5)

1-19: LGTM!

Imports are appropriate and well-organized for the script's functionality.


21-36: LGTM!

Clean dataclass definition with appropriate use of Optional for nullable fields.


290-326: LGTM!

Clean CSV generation implementation. Local imports are acceptable here since this output format may not always be used.


329-332: LGTM!

Correct wrapper pattern for multiprocessing pool.


455-463: LGTM!

Multiprocessing with imap provides ordered results while allowing progress reporting. Appropriate for a validation script.

xinaesthete and others added 2 commits December 15, 2025 19:01
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
scripts/validate-datasets-js.js (2)

60-114: Normalize thrown errors and stabilize the result schema for easier downstream consumption.

To make JSON/CSV consumers simpler and avoid surprises if a non-Error value is ever thrown, you can normalize error and always initialize stackTrace on the result:

 async function validateDataset(dataset, useProxy = false) {
   const result = {
     datasetName: dataset.name,
     datasetUrl: dataset.url,
     implementation: 'JavaScript (@spatialdata/core)',
     success: false,
     errorType: null,
     errorMessage: null,
     elements: null,
-    coordinateSystems: null,
+    coordinateSystems: null,
+    stackTrace: null,
   };

   try {
@@
-  } catch (error) {
-    result.success = false;
-    result.errorType = error.constructor.name;
-    result.errorMessage = error.message;
-
-    // Include stack trace for debugging
-    if (error.stack) {
-      result.stackTrace = error.stack.split('\n').slice(0, 5).join('\n');
-    }
-  }
+  } catch (error) {
+    const err = error instanceof Error ? error : new Error(String(error));
+    result.success = false;
+    result.errorType = err.name;
+    result.errorMessage = err.message;
+
+    if (err.stack) {
+      result.stackTrace = err.stack.split('\n').slice(0, 5).join('\n');
+    }
+  }

This keeps the schema consistent and avoids undefined accesses if something non-standard is thrown.


169-215: Consider making coordinate system names more readable in markdown output.

If result.coordinateSystems contains objects rather than plain strings, join(', ') will render them as [object Object]. You could improve readability like:

-      if (result.coordinateSystems && result.coordinateSystems.length > 0) {
-        lines.push(`**Coordinate Systems:** ${result.coordinateSystems.join(', ')}`);
+      if (result.coordinateSystems && result.coordinateSystems.length > 0) {
+        const csLabels = result.coordinateSystems.map(cs =>
+          typeof cs === 'string'
+            ? cs
+            : (cs && cs.name) || JSON.stringify(cs)
+        );
+        lines.push(`**Coordinate Systems:** ${csLabels.join(', ')}`);

If coordinateSystems are already strings, this is a no-op; otherwise it produces more meaningful labels.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f57a74 and d12dd68.

📒 Files selected for processing (1)
  • scripts/validate-datasets-js.js (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-15T18:47:01.953Z
Learnt from: xinaesthete
Repo: Taylor-CCB-Group/SpatialData.js PR: 15
File: scripts/validate-all.sh:52-66
Timestamp: 2025-12-15T18:47:01.953Z
Learning: In the SpatialData.js monorepo, the validation scripts (scripts/validate-all.sh and scripts/validate-datasets-js.js) only depend on spatialdata/core and its dependency spatialdata/zarrextra, not on the react or vis packages. Therefore, when optimizing builds, it is sufficient to check only packages/core/dist before deciding to skip a build. Ensure this dependency boundary is enforced in review checks and that any changes to react/vis do not alter the validation flow.

Applied to files:

  • scripts/validate-datasets-js.js
📚 Learning: 2025-12-15T18:52:01.166Z
Learnt from: xinaesthete
Repo: Taylor-CCB-Group/SpatialData.js PR: 15
File: scripts/validate-datasets-js.js:14-55
Timestamp: 2025-12-15T18:52:01.166Z
Learning: In the SpatialData official documentation's "Spatial omics datasets" page, there is a discrepancy between the "Filename" column and the actual working remote URLs in the "work with data remotely" column. When working with spatialdata datasets from S3, use the URLs from the "work with data remotely" column, not the filenames. Examples of differences: visium_hd_3.0.0_io.zarr/ vs visium_hd_3.0.0_id.zarr/ (filename vs remote URL), mouse_liver.zarr vs MouseLiver.zarr/, spacem_helanih3t3.zarr vs spacem_hepanih3t3.zarr/.

Applied to files:

  • scripts/validate-datasets-js.js
🔇 Additional comments (3)
scripts/validate-datasets-js.js (3)

9-55: Imports and dataset URL definitions look correct and respect the core-only dependency boundary.

Importing readZarr from ../packages/core/dist/index.js keeps this script depending only on the core package, and the S3 URLs mirror the “work with data remotely” column from the SpatialData docs (even where those differ from the “Filename” column), which matches the intended behavior of the validator.

Based on learnings, this aligns with the repo’s validation and dependency constraints.


329-343: CSV quoting for JSON-heavy and error fields now looks correct.

Escaping double quotes in elements, coordinateSystems, and errorMessage before wrapping them in quotes prevents malformed CSV, and given the controlled content of the other columns this is sufficient for the current usage.


223-312: CLI parsing and main validation loop look solid for the current dataset set size.

Argument handling, dataset filtering (by substring), proxy warning, and the sequential validation loop all read cleanly and should behave as expected for ~10 remote datasets. The stderr/stdout split for logs vs. report output is also appropriate for CI usage.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@packages/core/src/models/VTableSource.ts`:
- Around line 3-5: The file VTableSource.ts contains a triple-slash reference to
"vite/client" which introduces a compile-time dependency on Vite types; remove
the line '/// <reference types="vite/client" />' from the source file and
instead add a new ambient declaration file (e.g., vite-env.d.ts) at the package
root that declares the needed ImportMetaEnv/ImportMeta typings or references
"vite/client" so Vite-specific types are isolated from core source files; ensure
the new file is included by tsconfig (or package types) so consumers building
without Vite won't fail typechecks.

In `@packages/core/src/schemas/index.ts`:
- Around line 332-356: The transform in rasterAttrsSchema currently treats any
object with an ome key as OME-NGFF 0.5 which can misclassify v0.4 objects;
update the guard inside the transform (the branch that checks 'ome' in data) to
also verify that data.ome.multiscales exists and is an array before treating the
payload as OME 0.5—i.e., in the transform of rasterAttrsSchema (which unions
rasterAttrs_OME_04_Schema and rasterAttrs_OME_05_Schema) require that
ome.multiscales is an Array (and non-empty if you prefer stricter checking)
before extracting omeData.multiscales and omeData.omero and returning the
reconstructed RasterAttrs; otherwise fall back to returning data as RasterAttrs.

In `@python/v0.6.1/generate_fixtures.py`:
- Around line 27-29: Remove the unnecessary f-string prefix on the second print
that has no interpolation: in the conditional that compares actual_version and
version (the block using actual_version, version and the two print() calls),
change the second print(f"   This may indicate the wrong environment is
active.") to a regular string literal so the f-prefix is removed; ensure you
only alter that print so static analyzer F541 is resolved.
- Around line 81-86: Rename the unused loop variable in the enumerate loop to
signal intentional non-use: change the loop header "for i, elem in
enumerate(elements):" to use "_elem" instead of "elem" (i.e., "for i, _elem in
enumerate(elements):") so static analysis (B007) no longer flags the unused
variable; leave the rest of the loop (the print using i) unchanged.

In `@tests/integration/fixtures.test.ts`:
- Around line 55-66: The current catch block around the FetchStore test only
checks error.message.includes('fetch') so ERR_UNSUPPORTED_ESM_URL_SCHEME and
other network/startup errors slip through; update the catch in the try/catch
that wraps the FetchStore/network call (the catch handling the variable error)
to detect a broader set of failures by checking error instanceof Error and
matching message for additional tokens such as 'ERR_UNSUPPORTED_ESM_URL_SCHEME',
'ECONNREFUSED', 'connect', or other network/fetch-related substrings (or falling
back to a pragmatic heuristic like treating non-application errors as skip
candidates), and apply the same widened detection to the other similar catch
blocks referenced in the comment so the tests gracefully skip when the test
server or environment isn’t available.
🧹 Nitpick comments (10)
scripts/validate-datasets-js.js (1)

10-10: Redundant dynamic import of fs module.

writeFileSync is already imported from node:fs at line 10, but there's a dynamic import of fs at line 322 to use readFileSync. Consider using a static import for readFileSync alongside writeFileSync for consistency.

♻️ Suggested refactor
-import { writeFileSync } from 'node:fs';
+import { writeFileSync, readFileSync } from 'node:fs';

Then at line 322-324:

-      const fs = await import('node:fs');
-      const data = fs.readFileSync(args.comparePython, 'utf-8');
+      const data = readFileSync(args.comparePython, 'utf-8');

Also applies to: 321-324

README.md (1)

193-198: Add language specifier to fenced code block.

The fenced code block at line 195 is missing a language specifier, which triggers a markdownlint warning. Since this is a URL example, use text or an empty specifier.

📝 Suggested fix
 **Usage:**
 
 Proxy a remote URL by encoding it directly in the path:
-```
+```text
 http://localhost:8081/https://example.com/data.zarr/.zattrs
</details>

</blockquote></details>
<details>
<summary>python/scripts/validate_datasets.py (2)</summary><blockquote>

`172-172`: **Remove unused variable `e`.**

The exception variable `e` is assigned but never used. Use `_` to indicate it's intentionally unused, or remove it entirely.

<details>
<summary>♻️ Suggested fix</summary>

```diff
-        except (json.JSONDecodeError, IndexError) as e:
+        except (json.JSONDecodeError, IndexError):

259-280: Remove extraneous f prefix from strings without placeholders.

Lines 260, 276, and 280 use f-strings but contain no placeholders. Remove the f prefix.

♻️ Suggested fix
-                lines.append(f"**Status:** ✅ Success")
+                lines.append("**Status:** ✅ Success")
-                lines.append(f"**Status:** ❌ Failed")
+                lines.append("**Status:** ❌ Failed")
-                lines.append(f"**Error Message:**")
+                lines.append("**Error Message:**")
python/v0.7.0/generate_fixtures.py (1)

81-84: Rename unused loop variable.

The loop variable elem is not used within the loop body. Rename it to _elem to indicate it's intentionally unused.

♻️ Suggested fix
             elif isinstance(elements, list):
                 print(f"  - {element_type}: {len(elements)} element(s)")
-                for i, elem in enumerate(elements):
+                for i, _elem in enumerate(elements):
                     print(f"    * element_{i}")
scripts/validate-all.sh (3)

37-50: Consider adding set -o pipefail for safer pipeline handling.

With set -e, if the uv run command fails but the output redirection succeeds, the script will still exit. However, if you ever change this to use pipelines (e.g., command | tee file), errors in the first command could be masked. Adding set -o pipefail near line 6 would make the script more robust.

💡 Suggested improvement
-set -e  # Exit on error
+set -eo pipefail  # Exit on error and fail on pipeline errors

96-99: Missing dependency check for jq.

The script assumes jq is available but doesn't verify this. If jq is missing, the script will fail with an unhelpful error during the summary phase after already completing the time-consuming validation steps.

🛡️ Proposed fix to add dependency check

Add this near the top of the script (after the color definitions):

# Check required dependencies
for cmd in jq uv node pnpm; do
  if ! command -v "$cmd" &> /dev/null; then
    echo -e "${RED}Error: Required command '$cmd' not found.${NC}"
    exit 1
  fi
done

112-116: Symlink creation may fail silently on some systems.

The ln -s command creates a relative symlink, which is good. However, on some systems or edge cases (e.g., if $OUTPUT_DIR doesn't exist or has permission issues), this could fail. Consider adding error handling.

💡 Minor improvement for robustness
 # Create a symlink to latest results
 rm -f "$OUTPUT_DIR/latest"
-ln -s "$TIMESTAMP" "$OUTPUT_DIR/latest"
-echo -e "${GREEN}✓ Symlink created: $OUTPUT_DIR/latest${NC}"
+if ln -s "$TIMESTAMP" "$OUTPUT_DIR/latest" 2>/dev/null; then
+  echo -e "${GREEN}✓ Symlink created: $OUTPUT_DIR/latest${NC}"
+else
+  echo -e "${YELLOW}⚠ Could not create symlink (may not be supported on this filesystem)${NC}"
+fi
python/v0.6.1/generate_fixtures.py (1)

42-46: Consider moving shutil import to module level.

The import shutil inside the function works but is unconventional. Module-level imports improve readability and make dependencies explicit at the top of the file.

💡 Suggested change

Move the import to the top of the file with other imports:

 import sys
 from pathlib import Path
+import shutil

Then remove line 43:

-    # Remove existing store if it exists (to allow regenerating fixtures)
-    import shutil
+    # Remove existing store if it exists (to allow regenerating fixtures)
packages/core/src/models/VTableSource.ts (1)

27-36: Add /* @vite-ignore */ directive for external URL dynamic import.

Vite requires the /* @vite-ignore */ comment to skip import analysis on external URLs; otherwise it will attempt to prebundle/resolve and may fail or generate warnings. The webpackIgnore: true comment alone is insufficient—it only affects webpack, not Vite. Add the Vite directive alongside the existing webpack comment:

Suggested fix
-    const module = await import(/* webpackIgnore: true */ 'https://cdn.vitessce.io/parquet-wasm@2c23652/esm/parquet_wasm.js');
+    const module = await import(
+      /* `@vite-ignore` */ /* webpackIgnore: true */
+      'https://cdn.vitessce.io/parquet-wasm@2c23652/esm/parquet_wasm.js'
+    );

Comment on lines +332 to +356
export const rasterAttrsSchema = z
.union([rasterAttrs_OME_04_Schema, rasterAttrs_OME_05_Schema])
.transform((data): RasterAttrs => {
// Format detection is structural: if 'ome' key exists, it's OME-NGFF 0.5 format
// This is independent of spatialdata_attrs.version (which is library version metadata)
if ('ome' in data && data.ome && typeof data.ome === 'object') {
const omeData = data.ome as {
multiscales: unknown;
omero?: unknown;
[key: string]: unknown;
};
return {
multiscales: omeData.multiscales as RasterAttrs['multiscales'],
omero: omeData.omero as RasterAttrs['omero'],
spatialdata_attrs: 'spatialdata_attrs' in data ? data.spatialdata_attrs : undefined,
// Preserve any other top-level fields
...Object.fromEntries(
Object.entries(data).filter(([key]) => key !== 'ome' && key !== 'spatialdata_attrs')
),
} as RasterAttrs;
}

// Otherwise, it's already in the v0.5.0 format (top-level multiscales), return as-is
return data as RasterAttrs;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n packages/core/src/schemas/index.ts | head -400 | tail -100

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 4476


🏁 Script executed:

rg "rasterAttrs_OME_04_Schema|rasterAttrs_OME_05_Schema" packages/core/src/schemas/index.ts -B 5 -A 5

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 1226


🏁 Script executed:

rg "passthrough" packages/core/src/schemas/index.ts -B 3 -A 3

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 1293


🏁 Script executed:

rg "const rasterAttrs_OME_04_Schema|const rasterAttrs_OME_05_Schema" packages/core/src/schemas/index.ts -A 30

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 1752


🏁 Script executed:

rg "RasterAttrs\['multiscales'\]" packages/core/src/schemas/index.ts -B 5 -A 5

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 577


🏁 Script executed:

# Let's check if there are any tests that might demonstrate the issue
fd "test|spec" packages/core/src -type f | head -20

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 245


🏁 Script executed:

fd "test\|spec" packages/core/src --extension ts --extension js | head -20

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 57


🏁 Script executed:

# Find test files more broadly
find packages/core -name "*.test.ts" -o -name "*.spec.ts" | head -20

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 106


🏁 Script executed:

# Let's understand the actual schema definitions and structure better
# Check rasterAttrs_OME_04_Schema - can it have an 'ome' key due to passthrough?
rg "rasterAttrs_OME_04_Schema" packages/core/src/schemas/index.ts -B 2 -A 25 | head -50

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 1810


🏁 Script executed:

# Let me check if there are any usages or tests showing this scenario
rg "rasterAttrsSchema" packages/core/src --include="*.ts" | head -15

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 516


🏁 Script executed:

# Check the RasterAttrs type definition to understand expected structure
rg "type RasterAttrs" packages/core/src/schemas/index.ts -A 15

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 535


🏁 Script executed:

# Let me check how rasterAttrsSchema is used/tested
rg "rasterAttrsSchema" packages/core/src -A 3 -B 1

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 1073


🏁 Script executed:

# Check if there are any examples or test data in the codebase
find packages/core -name "*.json" -o -name "*.yml" -o -name "*.yaml" | head -20

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 166


🏁 Script executed:

# Let's check the OME_05_Schema more carefully, especially the multiscales validation
cat -n packages/core/src/schemas/index.ts | sed -n '275,330p'

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 2289


🏁 Script executed:

# Check the actual validation - does OME_05 schema enforce structure on multiscales?
rg "rasterAttrs_OME_05_Schema" packages/core/src/schemas/index.ts -A 40

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 2977


Validate ome.multiscales structure before treating data as OME-NGFF 0.5 format.

The current check ('ome' in data && data.ome && typeof data.ome === 'object') can misclassify data if an extra ome field exists on a v0.5.0 attrs object. Since rasterAttrs_OME_04_Schema has .passthrough(), a v0.4 format object with an unrelated ome field (e.g., {multiscales: [...], ome: {foo: 123}}) would parse successfully, then the transform would incorrectly treat it as OME 0.5 and extract omeData.multiscales which would be undefined.

Strengthen the guard to check that ome.multiscales is actually an array before branching, ensuring format detection is based on the presence of the required nested structure, not just the ome key.

🛠️ Suggested guard for `ome` detection
-    if ('ome' in data && data.ome && typeof data.ome === 'object') {
-      const omeData = data.ome as {
+    const ome = (data as { ome?: { multiscales?: unknown } }).ome;
+    if (
+      ome &&
+      typeof ome === 'object' &&
+      Array.isArray((ome as { multiscales?: unknown }).multiscales)
+    ) {
+      const omeData = ome as {
         multiscales: unknown;
         omero?: unknown;
         [key: string]: unknown;
       };
🤖 Prompt for AI Agents
In `@packages/core/src/schemas/index.ts` around lines 332 - 356, The transform in
rasterAttrsSchema currently treats any object with an ome key as OME-NGFF 0.5
which can misclassify v0.4 objects; update the guard inside the transform (the
branch that checks 'ome' in data) to also verify that data.ome.multiscales
exists and is an array before treating the payload as OME 0.5—i.e., in the
transform of rasterAttrsSchema (which unions rasterAttrs_OME_04_Schema and
rasterAttrs_OME_05_Schema) require that ome.multiscales is an Array (and
non-empty if you prefer stricter checking) before extracting omeData.multiscales
and omeData.omero and returning the reconstructed RasterAttrs; otherwise fall
back to returning data as RasterAttrs.

Comment on lines +55 to +66
} catch (error) {
// If FetchStore doesn't support file:// or server isn't running,
// we'll skip this test with a helpful message
if (error instanceof Error && error.message.includes('fetch')) {
console.warn(
'Skipping integration test - test server not running. ' +
'Start it with: pnpm test:server'
);
return;
}
throw error;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Error detection may miss ESM URL scheme errors.

The skip logic checks if the error message includes 'fetch', but the pipeline failure shows ERR_UNSUPPORTED_ESM_URL_SCHEME. This error won't trigger the skip path, causing the test to fail instead of gracefully skipping.

Consider broadening the error detection to handle network-related errors more comprehensively.

🔧 Suggested fix
     } catch (error) {
       // If FetchStore doesn't support file:// or server isn't running,
       // we'll skip this test with a helpful message
-      if (error instanceof Error && error.message.includes('fetch')) {
+      if (error instanceof Error && (
+        error.message.includes('fetch') ||
+        error.message.includes('ECONNREFUSED') ||
+        error.message.includes('ERR_UNSUPPORTED')
+      )) {
         console.warn(
           'Skipping integration test - test server not running. ' +
           'Start it with: pnpm test:server'
         );
         return;
       }
       throw error;
     }

Apply similar changes to the other test blocks (lines 86-91, 106-111, 124-129).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
} catch (error) {
// If FetchStore doesn't support file:// or server isn't running,
// we'll skip this test with a helpful message
if (error instanceof Error && error.message.includes('fetch')) {
console.warn(
'Skipping integration test - test server not running. ' +
'Start it with: pnpm test:server'
);
return;
}
throw error;
}
} catch (error) {
// If FetchStore doesn't support file:// or server isn't running,
// we'll skip this test with a helpful message
if (error instanceof Error && (
error.message.includes('fetch') ||
error.message.includes('ECONNREFUSED') ||
error.message.includes('ERR_UNSUPPORTED')
)) {
console.warn(
'Skipping integration test - test server not running. ' +
'Start it with: pnpm test:server'
);
return;
}
throw error;
}
🤖 Prompt for AI Agents
In `@tests/integration/fixtures.test.ts` around lines 55 - 66, The current catch
block around the FetchStore test only checks error.message.includes('fetch') so
ERR_UNSUPPORTED_ESM_URL_SCHEME and other network/startup errors slip through;
update the catch in the try/catch that wraps the FetchStore/network call (the
catch handling the variable error) to detect a broader set of failures by
checking error instanceof Error and matching message for additional tokens such
as 'ERR_UNSUPPORTED_ESM_URL_SCHEME', 'ECONNREFUSED', 'connect', or other
network/fetch-related substrings (or falling back to a pragmatic heuristic like
treating non-application errors as skip candidates), and apply the same widened
detection to the other similar catch blocks referenced in the comment so the
tests gracefully skip when the test server or environment isn’t available.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@scripts/validate-datasets-js.js`:
- Around line 333-336: The fatal error handler currently logs and exits but
doesn't stop the proxy server if it was started; modify the catch block for
main() to check for the proxy server instance (e.g., proxyServer) and perform a
proper shutdown (call proxyServer.close() and await/handle its completion or
errors) before process.exit(1), or move proxyServer to module scope so the
handler can access and close it; ensure any errors from shutdown are logged and
do not leak the server.
🧹 Nitpick comments (5)
README.md (1)

196-199: Add language identifier to fenced code block.

The code block at line 197 is missing a language specifier. Since this is showing a URL format/pattern, use text or plaintext to satisfy markdown linting.

📝 Suggested fix
 **Usage:**
 
 Proxy a remote URL using query parameter:
-```
+```text
 http://localhost:8081/?url=https://example.com/data.zarr/.zattrs
</details>

</blockquote></details>
<details>
<summary>scripts/cors-proxy.js (3)</summary><blockquote>

`12-16`: **Variable shadowing: `resolve` from 'node:path' is shadowed in Promise callbacks.**

The `resolve` import from `node:path` is shadowed by the Promise `resolve` parameter in `createProxyServer` (line 175). While this works because the import isn't used in that scope, it can cause confusion. Consider renaming the import.



<details>
<summary>♻️ Suggested fix</summary>

```diff
 import { createServer } from 'node:http';
 import { URL, fileURLToPath } from 'node:url';
-import { resolve } from 'node:path';
+import { resolve as resolvePath } from 'node:path';

Then update line 211:

-const isMainModule = process.argv[1] && resolve(process.argv[1]) === __filename;
+const isMainModule = process.argv[1] && resolvePath(process.argv[1]) === __filename;

55-66: Consider including PATCH method in body forwarding.

The body forwarding logic handles POST and PUT, but PATCH requests also commonly include a body. This could cause issues if any proxied APIs use PATCH with payloads.

♻️ Suggested fix
     // Forward request body for POST/PUT
     let body = null;
-    if (req.method === 'POST' || req.method === 'PUT') {
+    if (req.method === 'POST' || req.method === 'PUT' || req.method === 'PATCH') {
       const chunks = [];
       for await (const chunk of req) {
         chunks.push(chunk);

191-206: Unbounded recursion risk in port retry logic.

The startProxyServer function recursively retries on the next port when EADDRINUSE occurs. In edge cases where many consecutive ports are occupied, this could lead to deep recursion or unexpected behavior. Consider adding a maximum retry limit.

🛡️ Suggested fix with retry limit
-export async function startProxyServer(port = 8081) {
+export async function startProxyServer(port = 8081, maxRetries = 10) {
+  if (maxRetries <= 0) {
+    throw new Error(`Could not find an available port after multiple retries (starting from ${port - 10})`);
+  }
   try {
     const server = await createProxyServer(port);
     // Get the actual port (in case it was changed due to port conflict)
     const actualPort = server.address()?.port || port;
     const baseUrl = `http://localhost:${actualPort}`;
     return { server, baseUrl };
   } catch (error) {
     // If port is in use, try next port
     if (error.code === 'EADDRINUSE') {
       console.error(`Port ${port} is in use, trying ${port + 1}...`);
-      return startProxyServer(port + 1);
+      return startProxyServer(port + 1, maxRetries - 1);
     }
     throw error;
   }
 }
scripts/validate-datasets-js.js (1)

298-307: CSV escaping is incomplete for some fields.

While elements, coordinateSystems, and errorMessage are now properly escaped, the datasetName, datasetUrl, and implementation fields are not. If any dataset name or URL contains double quotes (unlikely but possible), the CSV will be malformed.

♻️ Suggested fix for complete CSV escaping
   } else if (args.outputFormat === 'csv') {
     // Simple CSV generation
+    const csvEscape = (str) => str ? String(str).replace(/"/g, '""') : '';
     const lines = ['Dataset Name,Dataset URL,Implementation,Success,Error Type,Error Message,Elements,Coordinate Systems'];
     for (const r of results) {
       const elements = r.elements ? JSON.stringify(r.elements).replace(/"/g, '""') : '';
       const cs = r.coordinateSystems ? JSON.stringify(r.coordinateSystems).replace(/"/g, '""') : '';
-      const errorMsg = (r.errorMessage || '').replace(/"/g, '""');
-      lines.push(`"${r.datasetName}","${r.datasetUrl}","${r.implementation}",${r.success},"${r.errorType || ''}","${errorMsg}","${elements}","${cs}"`);
+      lines.push(`"${csvEscape(r.datasetName)}","${csvEscape(r.datasetUrl)}","${csvEscape(r.implementation)}",${r.success},"${csvEscape(r.errorType)}","${csvEscape(r.errorMessage)}","${elements}","${cs}"`);
     }
     output = lines.join('\n');

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@packages/zarrextra/src/index.ts`:
- Around line 63-70: The loop that walks pathParts uses
getZattrs(normalizedPath, store) for every node, causing parent nodes to receive
leaf attributes; change the attrs lookup to use the current node's prefix path
(build the prefix from pathParts[0..i] or maintain a runningPrefix) and call
getZattrs(prefixAsAbsolutePath, store) so each created node (in the loop that
references pathParts, currentNode, isLeaf, isArray) gets attributes for its own
path rather than the final normalizedPath.
- Around line 173-178: The current regex in the loop over v2Metadata.metadata
(matching flatPath.match(/^(.+?)\/(\.zattrs|\.zarray|\.zgroup)$/)) requires a
path prefix and thus skips root keys like ".zattrs" or ".zgroup"; update the
matching logic so the path prefix is optional, capture the metadata type into
metadataType, and when the captured path is missing set path to the root (e.g.,
empty string) before using pathGroups; adjust the conditional that creates/uses
pathGroups to handle the root path as well so root attributes are preserved.

In `@packages/zarrextra/src/zarrSchema.ts`:
- Around line 170-178: In validateAndConvertV2Zarray, the code uses the 'in'
operator on zarray which will throw for null/non-object inputs; first guard
against non-objects by validating that zarray is a non-null object (e.g., typeof
zarray === 'object' && zarray !== null) before casting and computing hasV3Fields
/ hasV2Fields, and if the check fails throw or return a clear validation error;
update the early code around the hasV3Fields/hasV2Fields checks to use that
guard and reference validateAndConvertV2Zarray, obj, hasV3Fields, and
hasV2Fields when making the change.
- Around line 121-132: v3ZarrayBaseSchema currently omits the array-level
attributes field so validateV3Zarray drops any incoming metadata; add an
attributes entry to v3ZarrayBaseSchema (e.g., attributes:
z.record(z.unknown()).optional().nullable()) and then update validateV3Zarray
(the validator/normalizer logic that builds the output object) to preserve the
incoming attributes instead of defaulting to {} so array-level attributes from
the input metadata are retained.

In `@python/scripts/validate_datasets.py`:
- Line 104: The call currently double-quotes the json.dumps result causing
invalid syntax; change the line using sd.read_zarr so it does not wrap
json.dumps(...) in extra string quotes — e.g. replace
sd.read_zarr("{json.dumps(dataset['url'])}") with
sd.read_zarr({json.dumps(dataset['url'])}) (or simply
sd.read_zarr(dataset['url']) if you want to pass the raw value); update the
sd.read_zarr call that references dataset['url'] and json.dumps accordingly.
🧹 Nitpick comments (1)
python/scripts/validate_datasets.py (1)

172-172: Remove unused variable e.

The exception variable is captured but never used.

♻️ Proposed fix
-        except (json.JSONDecodeError, IndexError) as e:
+        except (json.JSONDecodeError, IndexError):

Comment on lines 121 to 132
const v3ZarrayBaseSchema = z.object({
shape: z.array(nonNegativeInteger), // Empty array allowed for scalars
data_type: z.string().min(1),
chunk_grid: v3ChunkGridSchema,
chunk_key_encoding: v3ChunkKeyEncodingSchema.optional(),
fill_value: z.union([z.number(), z.string(), z.boolean()]).optional(),
codecs: z.array(v3CodecSchema).optional().nullable(),
dimension_names: z.array(z.string()).optional().nullable(),
storage_transformers: z.array(z.unknown()).optional().nullable(),
zarr_format: z.number().optional(),
node_type: z.literal('array').optional()
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Zarr v3 array metadata attributes consolidated metadata specification

💡 Result:

Zarr v3 array metadata: attributes

In Zarr v3, each array has a zarr.json metadata document. Besides required fields (e.g., shape, data_type, codecs, etc.), arrays may include an optional attributes member:

  • attributes: a JSON object for arbitrary user/application metadata (keys are strings; values are any JSON literals). The core spec shows attributes inside an array’s zarr.json example. [1]

(աբար similarly, groups also have optional attributes in their zarr.json.) [1]

Consolidated metadata in Zarr v3: spec status

There is no finalized, core Zarr v3 consolidated-metadata specification in the Zarr v3 core spec. Instead:

  • Zarr-Python supports consolidated metadata for v3 but flags it as experimental, noting that a formal v3 extension has been proposed (referenced as zarr-specs#309 in the docs). [2]
  • The zarr-specs repository tracks this as a protocol-extension discussion (“consolidated metadata storage for Zarr v3”), explicitly stating that consolidated metadata storage is not part of the v3 spec and likely needs an extension definition. [3]

Practical takeaway: in v3 today, attributes are standard and live in each node’s zarr.json, but “consolidated metadata” is currently an implementation feature/proposed extension, not a settled part of the v3 core spec. [1][2][3]

Sources: [1] Zarr v3 core specification (metadata / array & group attributes) [2] Zarr-Python consolidated metadata docs (v3 is experimental; points to zarr-specs proposal) [3] zarr-specs issue on consolidated metadata for v3 (extension needed)


🏁 Script executed:

# Find and examine the file
find . -name "zarrSchema.ts" -type f

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 113


🏁 Script executed:

# Read the zarrSchema.ts file to examine the actual code
wc -l ./packages/zarrextra/src/zarrSchema.ts

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 117


🏁 Script executed:

# Read the entire file to see the actual code
cat -n ./packages/zarrextra/src/zarrSchema.ts

Repository: Taylor-CCB-Group/SpatialData.js

Length of output: 10539


Add attributes field to v3 array schema and preserve from input metadata.

Lines 121-132 and 259-270: v3ZarrayBaseSchema omits attributes, causing validateV3Zarray to always return attributes: {} and discard any array-level attributes from the input metadata.

Suggested fix
 const v3ZarrayBaseSchema = z.object({
   shape: z.array(nonNegativeInteger), // Empty array allowed for scalars
   data_type: z.string().min(1),
+  attributes: z.record(z.string(), z.unknown()).optional(),
   chunk_grid: v3ChunkGridSchema,
   chunk_key_encoding: v3ChunkKeyEncodingSchema.optional(),
   fill_value: z.union([z.number(), z.string(), z.boolean()]).optional(),
   codecs: z.array(v3CodecSchema).optional().nullable(),
   dimension_names: z.array(z.string()).optional().nullable(),
   storage_transformers: z.array(z.unknown()).optional().nullable(),
   zarr_format: z.number().optional(),
   node_type: z.literal('array').optional()
 });
   return {
     shape: v3.shape,
     data_type: v3.data_type,
     chunk_grid: v3.chunk_grid,
     chunk_key_encoding: v3.chunk_key_encoding ?? {
       name: 'default',
       configuration: { separator: '/' }
     },
     fill_value: v3.fill_value ?? 0,
     codecs: v3.codecs ?? [],
-    attributes: {},
+    attributes: v3.attributes ?? {},
     dimension_names: v3.dimension_names ?? [],
     zarr_format: 3,
     node_type: 'array',
     storage_transformers: v3.storage_transformers ?? []
   };
🤖 Prompt for AI Agents
In `@packages/zarrextra/src/zarrSchema.ts` around lines 121 - 132,
v3ZarrayBaseSchema currently omits the array-level attributes field so
validateV3Zarray drops any incoming metadata; add an attributes entry to
v3ZarrayBaseSchema (e.g., attributes:
z.record(z.unknown()).optional().nullable()) and then update validateV3Zarray
(the validator/normalizer logic that builds the output object) to preserve the
incoming attributes instead of defaulting to {} so array-level attributes from
the input metadata are retained.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@packages/zarrextra/src/index.ts`:
- Around line 63-93: The code calls getZattrs with the full normalizedPath for
every path segment inside the for-loop (for ... of pathParts.entries()), causing
intermediate group nodes to inherit leaf attributes; change this to compute the
per-segment prefix path (e.g. join the first i+1 parts or build incrementally)
and pass that segment-specific path to getZattrs instead of normalizedPath (keep
using normalizedPath or the full path when isLeaf true); update the variable
used for attrs (currently attrs) and any captured path (arrayPath) accordingly
so each node uses its own segment path while leaving leaf behavior unchanged.
- Around line 169-193: The current grouping loop over v2Metadata.metadata drops
root-level entries because the regex in the path matching (used in the block
that populates pathGroups) requires a non-empty prefix before
"/.zattrs|.zarray|.zgroup"; update the match to allow an optional path prefix
(so it matches both ".zattrs" and "some/path/.zattrs"), and when creating the
group key use a stable representation for root (e.g., empty string or '/' as the
path key) so pathGroups contains root entries; also update any downstream logic
(parseStoreContents) to recognize and handle that root key representation when
converting groups to final outputs. Ensure you modify the matching/assignment
logic around v2Metadata.metadata, pathGroups, and the
group.zattrs/.zarray/.zgroup assignments.

In `@packages/zarrextra/src/zarrSchema.ts`:
- Around line 74-163: The v2 and v3 zarray schemas currently reject valid Zarr
"overhang" chunk sizes and disallow null fill values for v2; update
v2ZarraySchema by removing the refine that enforces each chunk <= corresponding
shape (the refine referencing data.chunks.every(...)) and add z.null() to the v2
fill_value union in v2ZarrayBaseSchema (the fill_value definition), and update
v3ZarraySchema by removing the analogous refine that enforces
chunk_grid.configuration.chunk_shape.every(...); keep the other refines (length
equality and dimension_names) intact so only the bounds checks are dropped and
v2 fill_value accepts null.

In `@python/scripts/validate_datasets.py`:
- Line 104: The call to sd.read_zarr wraps json.dumps(dataset['url']) in quotes
producing a double-quoted string and invalid Python; change the argument to pass
the raw URL string instead (use dataset['url'] directly) so the call in the
sdata assignment (sd.read_zarr(...)) receives a proper Python str rather than an
already-quoted JSON string.
- Line 172: The except clause currently captures an unused variable `e` in
"except (json.JSONDecodeError, IndexError) as e:"; either remove the binding so
it reads "except (json.JSONDecodeError, IndexError):" or keep `as e` and use it
in the error handling (e.g., include `e` in the log/raise message) inside the
same try/except block in python/scripts/validate_datasets.py to eliminate the
unused-variable warning and preserve useful debug info.
🧹 Nitpick comments (1)
python/scripts/validate_datasets.py (1)

219-230: Consider centralizing version constants.

The versions "0.5.0", "0.6.1", "0.7.0" are hardcoded here, in argparse.choices (line 344), and in the default versions list (line 401). If versions are added or removed, multiple locations need updating.

♻️ Suggested approach

Define a single constant near the top of the file:

SUPPORTED_VERSIONS = ["0.5.0", "0.6.1", "0.7.0"]

Then reference it in argparse, the default versions list, and dynamically generate table headers/columns.

elements=output.get("elements"),
coordinate_systems=output.get("coordinate_systems"),
)
except (json.JSONDecodeError, IndexError) as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove unused exception variable.

The variable e is captured but never used. Either remove the binding or use it in the error message for debugging.

🔧 Proposed fix
-        except (json.JSONDecodeError, IndexError) as e:
+        except (json.JSONDecodeError, IndexError):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except (json.JSONDecodeError, IndexError) as e:
except (json.JSONDecodeError, IndexError):
🧰 Tools
🪛 Ruff (0.14.14)

[error] 172-172: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🤖 Prompt for AI Agents
In `@python/scripts/validate_datasets.py` at line 172, The except clause currently
captures an unused variable `e` in "except (json.JSONDecodeError, IndexError) as
e:"; either remove the binding so it reads "except (json.JSONDecodeError,
IndexError):" or keep `as e` and use it in the error handling (e.g., include `e`
in the log/raise message) inside the same try/except block in
python/scripts/validate_datasets.py to eliminate the unused-variable warning and
preserve useful debug info.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/core/tests/schemas.spec.ts`:
- Around line 230-232: The comment on spatialdata_attrs in the shapesAttrsSchema
test is misleading; update the comment next to spatialdata_attrs (in the
shapesAttrsSchema declaration/expected object) to state that this is common
spatial metadata shared across element types (not "Library version metadata (for
raster elements)"), so replace or reword the comment to reflect that
spatialdata_attrs is common metadata across element types and may include
version info like '0.6.1'.
🧹 Nitpick comments (6)
packages/core/tests/schemas.spec.ts (4)

13-13: Clarify or address the TODO comment.

The comment "TODO: invert this type..." is ambiguous. Consider either:

  • Adding more context about what needs to be inverted and why
  • Creating a tracking issue if this is deferred work
  • Removing if no longer applicable

117-119: Consider consolidating redundant parse calls.

The pattern of calling expect().not.toThrow() followed by another parse() is repeated throughout. You can simplify by parsing once and asserting on the result.

♻️ Suggested consolidation
-      expect(() => coordinateTransformationSchema.parse(transform)).not.toThrow();
-      const result = coordinateTransformationSchema.parse(transform);
-      expect(result[0].type).toBe('scale');
+      const result = coordinateTransformationSchema.parse(transform);
+      expect(result[0].type).toBe('scale');

If the parse succeeds, assertions on the result implicitly confirm it didn't throw. Reserve expect().not.toThrow() for tests where you only need to verify parsing succeeds without inspecting the result.


248-266: Consider adding a test for points attrs without transformations.

For consistency with shapesAttrsSchema tests (which include both with and without transformations), consider adding a similar negative case for pointsAttrsSchema.

Also, the comment on line 260 has the same issue as noted above - "(for raster elements)" is misleading for points attrs.

♻️ Suggested additional test
    it('should accept points attrs without transformations', () => {
      const attrs = {
        'encoding-type': 'ngff:points',
        axes: ['x', 'y'],
      };

      expect(() => pointsAttrsSchema.parse(attrs)).not.toThrow();
    });

293-322: Consolidate duplicate test cases.

The tests at lines 293-313 and 325-343 validate nearly identical scenarios (valid spatial data metadata with affine transformation). Similarly, lines 315-322 and 345-351 both test rejection of numeric version.

Consider consolidating these to reduce redundancy while maintaining coverage.

♻️ Suggested consolidation

Keep the more comprehensive tests and add comments noting they cover cases originally in index.test.ts:

-    // Tests from packages/core/src/index.test.ts
-    it('should validate spatial data metadata (from index.test.ts)', () => {
-      // coordinateSystems values are arrays of transformations
-      const validMetadata = {
-        version: '0.1.0',
-        coordinateSystems: {
-          global: [
-            {
-              type: 'affine',
-              affine: [
-                [1, 0, 0],
-                [0, 1, 0],
-              ],
-            },
-          ],
-        },
-      };
-
-      expect(() => spatialDataSchema.parse(validMetadata)).not.toThrow();
-    });
-
-    it('should reject invalid spatial data metadata (from index.test.ts)', () => {
-      const invalidMetadata = {
-        version: 123, // should be string
-      };
-
-      expect(() => spatialDataSchema.parse(invalidMetadata)).toThrow();
-    });

The existing tests at lines 293-322 already cover these cases adequately.

Also applies to: 325-351

packages/vis/tests/index.spec.tsx (1)

1-42: Test file structure looks good for API surface validation.

The tests appropriately verify that the public exports from @spatialdata/vis are defined and have the expected types. A few minor observations:

  1. The file extension .tsx could be .ts since there's no JSX content.
  2. The test at lines 30-41 partially duplicates checks from lines 11-17 (both verify the same component names).

Regarding the pipeline failure about @spatialdata/react: this appears to be a build/dependency resolution issue in the react package itself (possibly missing or incorrect main/module/exports in its package.json), not an issue with this test file.

packages/zarrextra/tests/zarrSchema.spec.ts (1)

130-226: Consider adding a test for v3 dimension_names length mismatch.

The v2 schema tests include a case for dimension_names length mismatch (lines 118-127), but the v3 schema tests don't have an equivalent. Based on the relevant code snippet showing that v3ZarraySchema has the same refinement rule for dimension_names, this edge case should also be tested for v3.

📝 Suggested test to add
it('should reject v3 zarray with dimension_names length mismatch', () => {
  const invalidZarray = {
    shape: [100, 200],
    data_type: 'float64',
    chunk_grid: {
      name: 'regular',
      configuration: { chunk_shape: [10, 20] },
    },
    dimension_names: ['y'], // Wrong length
  };

  expect(() => v3ZarraySchema.parse(invalidZarray)).toThrow();
});

Comment on lines +230 to +232
spatialdata_attrs: {
version: '0.6.1', // Library version metadata (for raster elements)
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading comment about version metadata scope.

The comment states "Library version metadata (for raster elements)" but this test is for shapesAttrsSchema, not raster. spatialdata_attrs is common metadata across element types.

📝 Suggested fix
         spatialdata_attrs: {
-          version: '0.6.1', // Library version metadata (for raster elements)
+          version: '0.6.1', // Library version metadata
         },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
spatialdata_attrs: {
version: '0.6.1', // Library version metadata (for raster elements)
},
spatialdata_attrs: {
version: '0.6.1', // Library version metadata
},
🤖 Prompt for AI Agents
In `@packages/core/tests/schemas.spec.ts` around lines 230 - 232, The comment on
spatialdata_attrs in the shapesAttrsSchema test is misleading; update the
comment next to spatialdata_attrs (in the shapesAttrsSchema declaration/expected
object) to state that this is common spatial metadata shared across element
types (not "Library version metadata (for raster elements)"), so replace or
reword the comment to reflect that spatialdata_attrs is common metadata across
element types and may include version info like '0.6.1'.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@vitest.config.ts`:
- Around line 1-2: The config currently relies on the CommonJS __dirname which
is undefined in ESM; replace __dirname usage by deriving the directory from
import.meta.url using fileURLToPath (from 'url') and then use resolve (or
path.resolve) with that directory; update any places referencing __dirname
(e.g., where resolve(__dirname, ... ) is used) to use the computed dirname
variable so Vitest config (defineConfig) runs under ESM without ReferenceError.
🧹 Nitpick comments (2)
python/scripts/validate_datasets.py (2)

219-220: Consider extracting the version list as a constant.

The version list ["0.5.0", "0.6.1", "0.7.0"] appears hardcoded in multiple places (table header and detailed results iteration). This creates maintenance overhead when adding new versions.

♻️ Optional refactor to centralize versions
+SUPPORTED_VERSIONS = ["0.5.0", "0.6.1", "0.7.0"]
+
 # In generate_markdown_table:
-    lines.append("| Dataset | v0.5.0 | v0.6.1 | v0.7.0 | URL |")
-    lines.append("|---------|--------|--------|--------|-----|")
+    version_headers = " | ".join(f"v{v}" for v in SUPPORTED_VERSIONS)
+    lines.append(f"| Dataset | {version_headers} | URL |")
+    lines.append("|---------|" + "--------|" * len(SUPPORTED_VERSIONS) + "-----|")

Also applies to: 251-251


341-347: Version list is duplicated across the codebase.

The version list appears in argparse.choices (line 344), main() default (line 401), and generate_markdown_table(). Centralizing this as a module-level constant would improve maintainability.

Also applies to: 400-401

@xinaesthete xinaesthete merged commit 9c0720e into main Feb 3, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implementation is overly coupled to zarr v2

1 participant